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Abstract 

Optimal behavior in (competitive) situation is traditionally determined 
with the help of utility functions that measure the payoff of different ac¬ 
tions. Given an ordering on the space of revenues (payoffs), the classical 
axiomatic approach of von Neumann and Morgenstern establishes the ex¬ 
istence of suitable utility functions, and yields to game-theory as the most 
prominent materialization of a theory to determine optimal behavior. Al¬ 
though this appears to be a most natural approach to risk management too, 
applications in critical infrastructures often violate the implicit assumption of 
actions leading to deterministic consequences. In that sense, the gameplay 
in a critical infrastructure risk control competition is intrinsically random 
in the sense of actions having uncertain consequences. Mathematically, this 
takes us to utility functions that are probability-distribution-valued, in which 
case we loose the canonic (in fact every possible) ordering on the space of 
payoffs, and the original techniques of von Neumann and Morgenstern no 
longer apply. 

This work introduces a new kind of game in which uncertainty applies 
to the payoff functions rather than the player’s actions (a setting that has 
been widely studied in the literature, yielding to celebrated notions like the 
trembling hands equilibrium or the purification theorem). In detail, we show 
how to fix the non-existence of a (canonic) ordering on the space of prob¬ 
ability distributions by only mildly restricting the full set to a subset that 
can be totally ordered. Our vehicle to define the ordering and establish basic 
game-theory is non-standard analysis and hyperreal numbers. 
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65-67, 9020 Klagenfurt, Austria. This work has been done in the course of consultancy for the EU 
Project HyR.iM (Hybrid Risk Management for Utility Networks; see https://hyrim.net), led by the 
Austrian Institute of Technology (AIT; www.ait.ac.at). See the acknowledgement section. 
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1 Introduction 


Security risk management is a continuous cycle of action and reaction to the changing 
working conditions of an infrastructure. This cycle is detailed in relevant standards 
like ISO 2700x, where phases designated to planning, doing, checking and acting are 
rigorously defined and respective measures are given. 

Our concern in this report is an investigation of a hidden assumption underneath 
this recommendation, namely the hypothesis that some wanted impact can be achieved 
by taking the proper action. If so, then security risk management would degenerate 
to a highly complex but nevertheless deterministic control problem, to which optimal 
solutions and strategies could be found (at least in theory). 

Unfortunately, however, reality is intrinsically random to some extent, and the out¬ 
come of an action is almost never fully certain. Illustrative examples relate to how public 
opinion and trust are dependent on the public relation strategies of an institution. While 
there are surely ways to influence the public opinion, it will always be ultimately out 
of one’s full and exclusive control. Regardless of this, we ought to find optimal ways 
to influence the situation in the way we like. This can - in theory - again be boiled 
down to a (not so simple) optimization problem, however, one that works on optimizing 
partially random outcomes. This is where things start to get nontrivial. 

Difficulties in the defense against threats root in the nature of relevant attacks, since 
not all of them are immediately observable or induce instantly noticeable or measurable 
consequences. Indeed, the best we can do is finding an optimal protection against an a- 
priori identified set of attack scenarios, so as to gain the assurance of security against the 
known list of threat scenarios. Optimizing this protection is often, but not necessarily, 
tied to some kind of adversary modelling, in an attempt to sharpen our expectations 
about what may happen to us. Such adversary modeling is inevitably error prone, 
as the motives and incentives of an attacker may deviate from our imagination to an 
arbitrary extent. 

Approaching the problem mathematically, there are two major lines of decision mak¬ 
ing: one works with an a-priori hypothesis of the current situation, and incorporates 
current information into an a-posteriori model that tells us how things will evolve, and 
specifically, which events are more likely than others, given the full information that 
we have. Decision making in that case means that we seek the optimal behavior so 
as to master a specifically expected setting (described by the a-posteriori distribution). 
This is the Bayesian approach to decision making (see [10] for a fully comprehensive 
detailed). The second way of decision making explicitly avoids any hypothesis about the 
current situation, and seeks an optimal behavior against any possible setting. Unlike 
the Bayesian perspective, we would thus intentionally and completely ignore all available 
data and choose our actions to master the worst-case scenario. While this so-called min¬ 
imax decision making is obviously a more pessimistic and cautious approach, it appears 
better suited for risk management in situations where data is either not available, not 
trustworthy or inaccurate. 

For this reason, we will hereafter pursue the minimax-approach and dedicate section 
4.2 to a discussion how this fits into the Bayesian framework as a special case. 
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We assume that the risk manager can repeatedly take actions and that the possible 
actions are finitely many. Furthermore, we assume that the adversary against which we 
do our risk control, also has a finite number of possible ways to cause trouble. In terms 
of an ISO 2700k risk management process, the risk manager’s actions would instantiate 
controls, while the adversary’s actions would correspond to identified threat scenarios. 
The assumption of finiteness does stringently constrain us here, as an infinite number of 
actions to choose from may in any case overstrain a human decision-maker. 

The crucial point in all that follows is that any action (as taken by the decision maker) 
in any situation (action taken by the adversary) may have an intended but in any case 
random outcome. To properly formalize this and fit it into a mathematical, in fact 
game-theoretic, framework, we hereafter associate the risk manager with player 1 in our 
game, who competes with player 2, who is the adversary. Actions of either players are 
in game-theoretic literature referred to as pure strategies', the entirety of which will be 
abbreviated as PS\ and PS 2 for either player, so PS\ comprises all actions, hereafter 
called strategies available for risk management, while PS 2 comprises all trouble scenarios. 
For our treatment, it is not required to be specific on how the elements in both action 
sets look like, as it is sufficient for them to “be available”. 

Let PS\,PS 2 denote finite sets of strategies for two players, where player 1 is the 
honest defender (e.g., utility infrastructure provider), and player 2 is the adversary. We 
assume player 1 to be unaware of its opponents incentives, so that an optimal strategy is 
sought against any possible behavior within the known action space PS 2 of the opponent 
(rational or irrational, e.g., nature),. 

In this sense, PS 2 can be the set of all known possible security incidents, whose 
particular incarnations can become reality by the adversary’s action. To guard its assets, 
player 1 can choose from a finite set of actions PS\ to minimize the costs of a recovery 
from any incident, or equivalently, keep its risk under control. 

Upon these assumptions, the situation can be described by an (n x m)-matrix of 
scenarios, where n = \PSi\,m, = I-PS2I, each of which is associated with some cost 
Rij to recover the system from a malfunctioning state back to normal operation from 
scenario (i, j) G PS\ x PS 2 . We use the variable Rij henceforth to denote the cost of a 
repair made necessary by an incident j G PS 2 happening when the system is currently 
in configuration i G PS\. 

The process of risk management will be associated with player 1 putting the system 
into different configurations over time in order to minimize the risk R, tJ . 

Remark 1.1 We leave the exact understanding of “risk” or “damage” intentionally 
undefined here, as this will be quite different between various utility infrastructures or 
general fields of application. 

Remark 1.2 Neither the set PS\ nor the set PS 2 is here specified in any detail further 
than declaring it as an “action space”. The reason is, again, the expected diversity of 
actions and incidents among various fields of application (or utility infrastructures). 
Therefore and to keep this report as general and not limiting the applicability of the 
results to follow, we will leave the precise elements of PS\, PS 2 up to definitions that 
are tailored to the intended application. 
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Examples of strategies may include: 

• random spot checks in the system to locate and fix problems (idtimately, to keep 
the system running), 

• random surveillance checks and certain locations, 

• certain efforts or decisions about whether or not, and which, risks or countermea¬ 
sures shall be communicated to the society or user community, 

• etc. 

In real life settings, it can be expected that an action (regardless of who takes it), 
always has some intrinsic randomness. That is, the effect of a particular scenario (i. j) E 
PS\ x PS 2 is actually a random variable Rij, having only some “expected” outcome that 
may be different between any two occurrences of the same situation (i.j) over time. 

To be able to properly handle the arising random variables, let us think of those 
modeling not the benefits but rather the damage that a security incident may cause. In 
this view, we can go for minimization of an expectedly positive value that measures the 
cost of a recovery. Formally, we introduce the following assumption that will greatly 
ease theoretical technicalities throughout this work, while not limiting the practicability 
too much. 

The family {Rij : i E PS\,j E PS 2 } of random damage distributions in our game will 
be assumed with all members satisfying the following assumption: 

Assumption 1.3 Let Rij be a real-valued random variable. On Rij, we impose the 
following assumptions: 

• Rij > 1 (w.l.o.g. 1 ). 

• Rij has a known distribution Fij with compact support (note that this implies that 
all Rij is upper-bounded). 

• The probability measure induced by Fij is either discrete or continuous and has 
a density function fij. For continuous random variables, the density function is 
assumed to be continuous. 

1.1 Symbols and Notation 

This section is mostly intended to refresh the reader’s memory about some basic but 
necessary concepts from calculus and probability theory that we will use in the following 
to develop the theoretical groundwork. This subsection can thus be safely skipped and 
may be consulted whenever necessary to clarify details. 

Tt is common to assume losses to be > 0; our modification has technical reasons, but causes no semantic 
difference in the comparisons between two loss densities, since both loss variables are just shifted by 
the same amount. Also, the loss can (w.l.o.g.) be scaled until losses in the range [0,1) become 
practically negligible. 
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General Symbols: Sets, random variables and probability distribution functions are 
denoted as upper-case letters like X or F. Matrices and vectors are denoted as bold-face 
upper- and lower-case letters, respectively. For finite sets, we write |X| for the number 
of elements (cardinality). For real values |a| denotes the absolute value of a £ R. For 
arbitrary sets, the symbol X k is the /c-fold cartesian product of X\ the set X°° thus 
represents the collection of all infinite sequences (ai, 02 , 0 , 3 , ■..) with elements from X. 
We denote such a sequence as (a n ) ne fj. 

If X is a random variable, then its probability distribution F\ is told by the notation 
X ~ Fx■ Whenever this is clear from the context, we omit the subscript to F and write 
X ~ F only. If X lives on a discrete set, then we call X a discrete random variable. 
Otherwise, if X takes values in some infinite and uncountable set, such as R, then we 
call X a continuous random variable. For discrete distributions, we may also use the 
vector p of probabilities of each event to denote the distribution of the discrete variable 
X as X ~ p. 

Calligraphic letters denote families (sets) of sets or probability distributions, e.g., 
ultrafilters (defined below) are denoted as U, or the family of all probability distributions 
being denoted as (J. The family of subsets of a set A is written as V(A) (the power-set 
of A). If F E 5 is a probability distribution, then its density provided it exists - is 
denoted by the respective lower-case letter /. 

Topology and Norms: As our considerations in section 3 will heavily rely on concepts of 
continuity and compactness or openness of sets, we briefly review the necessary concepts 
now. 

A set A is called open, if for every x € A there is another open set B C A that contains 
x. The family T of all open sets is characterized by the property of being closed under 
infinite union and finite intersection. Such a set is called a topology, and the set X 
together with a topology T C V(X) is called a topological space. An interval A is called 
closed, if its complement (w.r.t. the space X) is open. 

In R, it can be shown that the open intervals are all of the form {x : a < x < b} for 
a, b E R and a < b. We denote these intervals by (a, b ) and the topology on R is the set 
containing all of them. Note the existence of a total ordering < on a space always induces 
the so-called order-topology, whose open sets are defined exactly the aforementioned way. 
Closed intervals are denoted by square brackets, [a, b] = {x : a < x < b}. An set X C R 
is called bounded, if there are two constants a, b < 00 so that all x G X satisfy a < x < b. 
An subset of R is called compact, if and only if it is closed and bounded. 

For ( X, dx), (Y, dy) being two metric spaces, we call a function f : X —tY continuous, 
if for every xq € X and every e > 0 there is some 6 > 0 for which dx(x 0 , y) < S implies 
dy (f(xo), f(y)) < £■ If the condition holds with the same e,5 for every xo € A C X, 
then we call / uniformly continuous on the set A. It can be shown that if a function / 
is continuous on a compact set A, then it is also uniformly continuous on A (in general, 
however, continuity does not imply uniform continuity). In the following, we will need 
this result only on functions mapping compact subsets of R to probability distributions 
(the space that we consider there will be the set of hyperreal numbers, which has a 
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topology but - unfortunately - neither a metric nor a norm). 

On a space X, we write ||x|| to denote the norm of a vector x. One example is the oo- 
norm on R”, which is Hx^ = ||(xi,... = max{|aq|,., \x n \} for every x G R n . 

This induces the metric d oa (x, y) = \\x — 

It can be shown that every metric space is also a topological space, but the converse 
is not true in general. However, the above definition of continuity is (on metric spaces) 
equivalent to saying that a function / : X —> Y is continuous, if and only if every open 
set in Y G Ty has an open preimage f~ 1 (B) G Tx, when Tx,Ty denote the topologies 
on X and Y, respectively. This characterization works without metrics and will be used 
later to prove continuity of payoff functions (see lemma 3.1 and proposition 3.2). 

Probabilities and Moments: Let A C H be a subset of some measurable 2 space H 
and F be a probability distribution function. The probability measure Prp(A) is the 
Lebesgue-Stieltjes integral Prp(A) = J A dF (note that this general formulation covers 
both, discrete and continuous random variables on the same formal ground). Whenever 
the distribution is obvious from the context, we will omit the subscript to the probability 
measure, and simply write Pr(A) as a shorthand of Pr^(A). 

All probability distribution functions F that we consider in this report will have a 
density function / associated with them. If so, then we call the closure of the set 
{x : f(x ) > 0} the support of F, denoted as supp(l ? ). A degenerate distribution on R 
is one that assigns probability mass 1 to a finite number (or more generally, a null-set) 
of points in R. If Pr(A) = 1 for a singleton set A = {a} and a G R, then we call this 
degenerate distribution a point-mass or a Dirac-mass. We stress that such distributions 
do not have a density function associated with them in general 3 . 

Many commonly used distributions have infinite support, such as the Gaussian dis¬ 
tribution. The density function can, however, be cut off outside a bounded range [a, b\ 
and re-scaled to normalize to a probability distribution again. This technique lets us 
approximate any probability distribution by one with compact support (a technique that 
will come handy in section 2.5). 

The expectation of a random variable is (by the law of large numbers) the long-run 
average of realizations, or more rigorously, defined as E(A) = [ n xdF(x)dx. The /c-tli 
moment of a distribution is the expectation of X k , which we is denoted and defined 
as mx{k) := E(A fc ) = J n x k dF(x), or also E(X fc ) = f n x k f(x)dx, if F has a den¬ 
sity function /. Special roles play the first four moments or values derived thereof. 
One prominent example is the variance Var(A) = E(A — E(X)) 2 = E(A 2 ) — (E(A) 2 ) 
(this formula is known as Steiner’s theorem). Of particular importance is the so-called 
moment-generating function yx( s ) = E(exp(s • A)), from which the fc-th moment can 
be computed by taking the k- th order derivative evaluated at the origin, i.e., we have 
E(A fc ) = ^^(s) . Moments do not necessarily exist for all distributions (an 

2 We will not require any further details on measurability or u-algebras in this report, so we spare details 
or an intuitive explanation of the necessary concepts here. 

3 at least not within the space of normal functions; the Dirac-mass is, however, an irregular generalized 
function (a concept that we will not need here). 
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example is the Cauchy-distribution, for which all moments are infinite), but exist for 
all distributions with compact support (that can be used to approximate every other 
distribution up to arbitrary precision). 

Multivariate distributions model vector-valued random variables. Their distribu¬ 
tion is denoted as Fx,y-, or shorthanded as F. For an n-dimensional distribution, 
the respective density function is then of the form f(x \,..., x n ), having the integral 
j Rn f(x i,..., x n )d(x i,..., x n ) = 1 . This joint distribution in particular models the in¬ 
terplay between the (perhaps mutually dependent) random variables X\,.... X n . The 
marginal distribution of any of the variables Xi (where 1 < i < n) is the unconditional 
distribution of X t no matter what the other variables do. Its density function is obtained 
by “integrating out” the other variables, i.e., 

fXi (.Xi) — / /(^ 1 1 • • • i Xi— i, Xi- |_i, • • • , X n )d(xi , • • • , Xi— i, Xi -\-\, • • • , X n ). 

JR "- 1 

A (marginal) distribution is called uniform, if its support is bounded and its density 
is a constant. The joint probability of a multivariate event, i.e., multidimensional set 
A w.r.t. to a multivariate distribution Fxy, is denoted as Pr (Fxy) (- 4 )- That is, the 
distribution w.r.t. which the probabilities are taken are given in the subscript, whenever 
this is useful or necessary to make things clear. 

A particular important class of distributions are copulas. These are multivariate 
probability distribution functions on the n-dimensional hypercube [0, l] n , for which all 
marginal distributions are uniform. The importance of copula functions is due to Sklars 
theorem, which tells that the joint distribution F of the random vector (X \,..., X n ) 
can be expressed in terms of marginal distribution functions and a copula function C as 
F(x i,..., x n ) = C(F\(x \),..., F n (x n )). So, for example, independence of events can be 
modeled by the simple product copula C(x i,..., x n ) = x\ ■ X2 ■ ■ ■ x n . Many other classes 
of copula functions and a comprehensive discussion of the topic as such can be found in 
[ 8 ], 

Convexity and Concavity: Let V be a vector-space. We call a set A C V convex , if for 
any two points x,y 6 A, the entire line connecting x to y is also contained in A. Let 
/ : E ->1 be a function and take two values a < b. The function / is called convex , if 
for every two values x, y, the line between /(a) and f(b) upper-bounds / between a and 
b. More formally, let L a i,(x) be the straight line from f(a) to /(&), then / is convex if 
f(x) < L at b(x) for all a < x < b. A function / is called concave if (—/) is convex. 

Hyperreal Numbers and Ultrafilters: Take the set R°° of infinite sequences over the 
real numbers R. On this set, we can define the arithmetic operations + and • elemen¬ 
twise on two sequences a = (01,02,03,...) = (a n ) ne u £ R°° and b = (61,62,63,...) = 
(MneN £ K- 00 by setting a+ 6 = (ai + 61, a 2 + 6 2 , 03 + 63,...) and a- 6 = (a• 61, a• 6 2 ,o 3 • 
63,...). The ordering of the reals, however, cannot be carried over in this way, as the 
sequences a = ( 1 , 4 , 2 ,...) and 6 = ( 2 , 1 , 4 ,...) would satisfy < on some components and 
> on some others. To fix this, we need to be specific on which indices matter for the 



comparison, and which do not. The resulting family of index-sets can be characterized 
to be a so-called free ultrafilter, which is defined as follows: a family U C 'P(IN) is called 
a filter, if the following three properties are satisfied: 

• 0 (fU 

• closed under supersets: A C B and A € It implies B G U 

• closed under intersection: A,B <EU implies A n B G U 

If, in addition, A (fU implies that U contains the complement set of A, then U is called 
an ultrafilter. A simple example of a filter is the Frechet- filter, which is the family 
{A : the complement of A is finite}. A filter is called free, if it contains no finite sets, 
or equivalently, if any filter that contains U is equal to U, i.e., U is maximal w.r.t. the 
D-relation. An application of Zorn’s lemma to the semi-ordering induced by D shows the 
existence of free ultrafilter as being D-maximal elements, extending the Frechet-filter. 

An ultrafilter naturally induces an equivalence relation on R°° by virtue of calling 
two sequences a = (a n ) n& ^,b = ( b n ) ne m ^-equivalent, if and only if [i : a* = &.;} G U, 
i.e., the set of indices on which a and b coincide belongs to U. The <- and >-relations 
can be defined in exactly the same fashion. The family of equivalence classes modulo 
U makes up the set of hyperreal numbers, i.e., *R = {[a]u : a G 1R 00 } = IR °°/U, where 
[a]u = {b G IR°° : a =u b}. In lack of an exact model of *IR due to the non-constructive 
existence assurance of the necessary free ultrafilter, unfortunately, we are unable to 
practically do arbitrary arithmetic in *R. It will be shown (later and in part two of this 
report) that everything that needs to be computed practically works without U being 
explicitly known. 

Elements of Game Theory: Let N = {1,2,..., n} be a finite set. Let PSi be a finite 
set of actions, and denote by PS-i the cartesian product PS-i = PS i x PS2 x • • • x 
PSj- 1 x PSi -|_i x • • • x PS n , i.e., the product of all PSj excluding PSi. 

A finite non-cooperative n-person game is a triple ( N,H,S ), where the set H = 
{ui : PSi x PS-i —> IR : i G N} contains all player’s payoff functions, and the family 
S = {PS 1,... ,PS n } comprises the strategy sets of all players. The attribute finite is 
given to the game if and only if all PSi are finite. An equilibrium strategy is an element 
x* = (,x},..., x* n ) G nr=i so that all i G N have 

Ui(x*,x*_t) > Ui(Xi,X ij. (1) 

That is, action x\ gives the maximal outcome for the i -th player, provided that all 
other players follow their individual equilibrium strategies. Otherwise said, no player 
has an incentive to solely deviate from x*, as this would only worsen the revenue from 
the gameplay 4 . It is easy to construct examples in which no such equilibrium strategy 


4 It should be mentioned that this not necessarily rules out benefits for coalitions of players upon jointly 
deviating from the equilibrium strategy. This, however, is subject of cooperative game-theory, which 
we do not discuss here any further. 
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exists. To fix this, one usually considers repetitions of the gameplay, and defines the 
revenue for a player as the long-run average of all payoffs in each round. Technically, 
this assures the existence of equilibrium strategies in all finite games (Nash’s theorem). 
We will implicitly rely on this possibility here too, while explicitly looking at the outcome 
of the game in a single round. As this is - by our fundamental hypotheses in this report 
- a random variable itself, condition (1) can no longer be soundly defined, as random 
variables are not canonically ordered. The core of this work will therefore be on finding a 
substitute for the >-relation, so as to properly restate (1) when random variables appear 
on both sides of the inequality. 

2 Optimal Decisions under Uncertainty 

Under the above setting, we can collect all scenarios of actions that player 1 (defender) 
and player 2 (attacker) may take in a tabular (matrix) fashion. Our goal in this first step, 
is to soundly define what “a best action” would be in light of uncertain, indeed random, 
effects that actions on either side cause, especially in lack of control about the other’s 
actions. For that matter, we will consider the scenario matrix A as given below, as the 
payoff structure of some matrix-game, whose mathematical underpinning is the standard 
setup of game-theory (see [3] for example), with differences and necessary changes to 
classical theory of games, being discussed in sections 3 and later. 

Let the following tableau be a collection of all scenarios of actions taken by the defender 
(row-player) and attacker (column-player), 



f Ru 

R\j 

Rim ' 

A = 

Ril 

Rij 

Rim 


\ Rnl 

' ' ' Rnj 

' * * Rnm ) 


where the rows of the matrix A are labeled by the actions in PSi, and the columns of 
A carry the labels of actions from PS‘ 2 - 

A security strategy for player 1 is an optimal choice i* of a row so that the risk, 
expressed by the random variable Ri*j is “optimized” over all possible actions j £ PS 2 
of the opponent. Here, we run into trouble already, as there is no canonical ordering on 
the set of probability distributions. 

To the end of resolving this issue, let us consider repetitions of the gameplay in which 
each player can choose his actions repeatedly and differently, in an attempt to minimize 
risk (or damage). This corresponds to situations in which “the best” configuration simply 
does not exist, and we are forced to repeatedly change or reconsider the configuration 
of the system in order to remain protected. 

In a classical game-theoretic approach, this takes us to the concept of mixed strategies , 
which are discrete probability distributions over the action spaces of the players. Making 
this rigorous, let S(PSi) for i = 1,2 denote the simplex over PSi , i.e., the space of all 
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discrete probability distributions supported on PS). More formally, given the support 
X , the set S(X) is 

{ k 

(pi, • • • ,Pk) € : k = \X\,± Pi = 1 , m > 0 Vi 

i= 1 

A randomized decision is thus a rule p = (p\,... ,pk) E S(PS) to choose from the avail¬ 
able actions {1, 2,..., k} from the action set PS (which is PS\ or PS 2 hereafter) with 
corresponding probabilities p % . We assume the ordering of the actions to be arbitrary 
but fixed (for obvious reasons). 

Now, we return to the problem of what effect to expect when the current configu¬ 
ration of the system is randomly drawn from PSi, and the adversary’s action is an¬ 
other random choice from PS 2 . For that matter, let us simplify notation by putting 
Si := S(PSi),S 2 := S(PS 2 ) and let the two mixed strategies be p E Si for player 1, 
and q E S 2 for player 2. 

Since the choice from the matrix A is random, where the row is drawn with likelihoods 
as specified by p, and the column is drawn from q, the law of total probability yields for 
the outcome R, 

Pr (R < r) = E Pr(Rij < r\i, j)Pr(i,j) , (2) 

i,j 

where Pr (Rij < r\i,j ) is the conditional probability of Rjj given a particular choice 
and Pr(i,j) is the (unconditional) probability for this choice to occur. Section 2.1 gives 
some more details on how Pr (i,j) can be modeled and expressed. 

Denote by F(p,q) the distribution of the game’s outcome under strategies (p, q) E 
Si x S 2 , then Pr(P < r) = F(r ) depends on (p, q), and (2) can be rewritten as 

Pr(P < r) = (F(p, q))(r) = s ^F ij (r)C p , q {i, j), (3) 

where C Ptq (i,j ) = Pr (i,j) will be assumed as continuous in p and q for technical reasons 
that will become evident later (during the proof of proposition 3.2). Note that the 
distribution F via the function C explicitly depends on the choices p, q , and is to be 
“optimally shaped” w.r.t. to these two variables. The argument r E R to the function 
F(p, q){-) is the (random) “revenue”, whose uncertainty is outside any of the two player’s 
influence (besides shaping F by proper choices of p and q). 

The “revenue” R in the game can be of manifold nature, such as 

• Risk response of society; a quantitative measure that could rate people’s opinions 
and confidence in the utility infrastructure 

• Repair cost to recover from an incident’s implied damage, 

• Reliability, if the game is about whether or not a particular quality of service can 
be kept up, 

• etc. 
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Remark 2.1 In the simplest case of independent actions, we would set C P)Q (i,j) = 
Pi ■ qj when p = (p\ ...., p n ), q = (q\,... ,q m ). This choice, along with assuming Rij 
to he constants rather than random variables, recreates the familiar matrix-game payoff 
functional p 7 A q from (3). Hence, (3) is a first generalization of matrix games to games 
with uncertain outcome, which for the sake of flexibility and generality, is “distribution¬ 
valued”. 

Remark 2.2 It may be in reality the case that actions of the two players are not chosen 
independently, for example, if both of the players possess some common knowledge or 
access to a common source of information. In game-theoretic terms, this would lead 
to so-called correlated equilibria (see [3]), in which the players share two correlated 
random variables that influence their choices. Things here are nevertheless different, 
as no bidirectional flow of information can be assumed like for correlated equilibria (the 
attacker wont inform the utility infrastructure provider about anything in advance, while 
information from the provider may somehow leak out to the adversary). 

2.1 Choosing Actions (In)dependently 

The concrete choice of the function C p ^ q is only subject to continuity in p , q for tech¬ 
nical reasons that will receive a closer look now. The general joint probability of the 
scenario (i,j) w.r.t. the marginal discrete distribution vectors p,q is Pr Pi<7 {i,j} = 
Pr p , q {X = i,Y = j} = C Py q(i,j ) in (2). Under independence of the random choices 
X ~ p, Y ~ q can be written as Pr (i,j) = Pr(A = i) Pr(y = j) = piqj- 

Now, let us consider cases where the choices are not independent, say, if one player 
observes the other player’s actions and can react on them (or if both players have access 
to common source of information). 

Sklar’s theorem implies the existence of a copula-function C so that the joint distri¬ 
bution F( X .y) can be written in terms of the copula C and the marginal distributions 
Fx, corresponding to the vector p, and Fy, corresponding to the vector q , 

F( X ,y){i, j) = Pr(A <i,Y<j) = C(F x (i),F Y (j))- 


Pr (i,j) = Pr(A = i,Y = i) = Pr(A < i,Y < j) - Pr(A < i - 1,Y < j) 

— Pr(A < i, Y < j — 1) + Pr(A <i — l,Y < j — 1) 

= C(F x (i),F y (j)) - C(F x (i - 1 ),FyU)) 

- C(F x (i),F Y (j - 1)) + C(F x (i - 1 ),F Y (j - 1)) 

= C(Pi, Qj ) - C(pi-i, qfl) - C(pi , qj-i) + Cipi-uqj-i). (4) 

Thus, the function C Ptq can be constructed from (4) based on the copula C (which must 
exist). Continuity of C P)Q thus hinges on the continuity of the copula function. At least 
two situations admit a choice of C that makes C p q continuous: 

• Independence of actions: C(x, y) := x ■ y 
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• Complete lack of knowledge about the interplay between the action choices, in 
which case we can set C{x,y) := min {a;, y}. 

This choice is justified upon the well-known Frechet-Hoeffding bound, which says 
that every ra-dimensional copula function C satisfies 

C(ui,u 2 , ■ ■ ■ ,u n ) < min{«i,..., u n } . 

Since the min-function is itself a copula, it can be chosen if a dependency is known 
to exist, but with no details on the particular nature of the interplay. Observe 
that this corresponds to the well-known maximum-principle of system security, 
where the overall system risk is determined from the maximum risk among its 
components (alternatively, you may think of a chain to be as strong as its weakest 
element; which corresponds to the min-function among all indicators u\,... ,u n ). 


2.2 Comparing Payoff Distributions 

There appears to be no canonical way to compare payoff distributions, as F(p, q) can 
be determined by an arbitrary number of parameters, thus introducing ambiguity in 
how to compare them. To see this, simply consider the set of normal distributions 
A f(p, cr 2 ) being determined by two parameters p and a > 0. Since the pair (p, a) uniquely 
determines the distribution function, a comparison between two members F\,F 2 € J\f 
amounts to a criterion to compare two-dimensional vectors (p, a) €E R X R + C R 2 . It 
is well-known that R 2 is not ordered (as being isomorphic to C, on which provably no 
order exists; see [1] for a proof), and hence there is no natural ordering on the set of 
probability distributions either. 

Despite this sounding like bad news, we can actually construct an alternative char¬ 
acterization of probability distributions on a new space, in which the distributions of 
interest, in our case F(p, q) will all be members of a totally ordered subset. 

To this end, we will rely on a characterization of a probability distribution of the 
random variable R ~ F(p,q) via the sequence (mn(k))ke ]n of its moments. The fc-th 
such moment is from (3) and by assumption 1.3 found to be 


[ E (f? A ')](p,q) = / x k dF{p,q)= f x k ^ fij(x)C p>q (i, j)dx 

V 7 J— OO J— OO -j 

= J2 C P, q ^j) r X k f lj (x)dx = Y,C Ptq (i,j)E(l$^ 


( 5 ) 




hJ 


where the sum runs over i = 1,2,... ,n and j = 1,2 ,... ,m, and fij is the probability 
density of Rij for all i.j. Notice that the boundedness condition in assumption 1.3 
assures existence and finiteness of all these moments. However, assumption 1.3 yields 
even more: since R ~ F(p, q) is a random variable within [0, oo) (nonnegativity) and 
has finite moments by the boundedness assumption, the distribution F(p, q) is uniquely 
determined by the sequence of moments. This is made rigorous by the following lemma: 
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Lemma 2.3 Let two random variables X, Y have their moment generating functions 
fix(s), hy (s) exist within a neighborhood U £ ( 0). Assume that E(A^) = E(T fe ) for all 
k G 1 ST. Then X and Y have the same distribution. 

The proof is merely a collection of well-known facts about moment generating functions 
and the identity of their power-series expansions. For convenience and completeness, we 
nevertheless give the proof in (almost) full detail. 

Proof (of lemma 2.3). Let Z be a general random variable. The finiteness of the 
moment-generating function /.iz within some open set (—so,so) with so > 0 yields 
E (Z k ) = fii k \o) via the k- th order derivative of /.iz [2, Theorem 3.4.3]. Furthermore, 
if the moment generating function exists within (—so,so)> then it has a Taylor-series 
expansion (cf. [5, Sec.11.6.1]). 

00 

pz(s) = ^2 Z k \ sk r Vs € (-So, So). (6) 

fc =0 

Identity of moments between X and Y (the lemma’s hypothesis) thus implies the iden¬ 
tity of the Taylor-series expansions of nx and /jy and in turn the identity px{s) = Hy(s) 
on (—so, so). This equation finally implies that X and Y have the same distribution by 
the uniqueness theorem of moment-generating functions [2, Theorem 3.4.6]. □ 

Lemma 2.3 is the permission to characterize random variables only by their moment- 
sequence to uniquely pin-down the probability distribution, i.e., we will hereafter write 
mR(k ) := E (R k ), and use 

(mii(k))k e in, to represent the random variable R F(p,q). (7) 

Let R°° denote the set of all sequences, on which we define a partial ordering by 
virtue of the above characterization as follows: let F\ = F(p\.q\). F2 = F(p2,q2) be 
two distributions defined by (3). As a first try, we could define a preference relation 
between two distributions F\, F2 by comparing their moment sequences element-wise, 
i.e., we would prefer F\ over F2 if the respective moments satisfy mR 1 (k) < mR 2 (k) for 
all k whenever R.\ ~ F\ and R 2 ~ F 2 . 

It must be stressed that without extra conditions, this ordering is at most a partial one, 
since we could allow infinitely alternating values for the moments in both sequences. To 
make the ordering total, we have to be specific on which indices matter and which don’t. 
The result will be a standard ultrapower construction, so let U denote an arbitrary 
ultrafilter. Fortunately, the preference ordering by comparing moments elementwise 
is ultimately independent of the particular ultrafilter in use. This is made precise in 
theorem 2.5 that is implied by a simple analysis of continuous distributions. We treat 
these first and discuss the discrete case later, as all of our upcoming findings remain 
valid under the discrete setting. 

The Continuous Case: 
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Lemma 2.4 For any two probability distributions F\, F 2 and associated random vari¬ 
ables R\ ~ Fi,R -2 ~ F 2 that satisfy assumption 1.3, there is a K 6 1 ST so that either 
\dk > K : m Rl (k) < m R2 (k)] or [\/k > I\ : m Rl (k) > m R , 2 (k)]. 

Proof. Let f\, f 2 denote the densities of the distributions F\, Fix the smallest b* > 1 
so that f l := [1,6*] covers both the supports of F\ and F 2 . Consider the difference of 
the k- th moments, given by 

A (k) := E - E (r£) = J x k f 1 (x)dx - J x k f 2 (x)dx 


= / x (/1 - f 2 )(x)dx. 

Jn 

Towards a lower bound to (8), we distinguish two cases: 


( 8 ) 


1. If fi(x) > f 2 (x) for all x G 17, then (/1 — f 2 )(x) > 0 and because /i ,/2 are 
continuous, their difference attains a minimum A 2 > 0 on the compact set Q. So, 
we can lower-bound (8) as A(fc) > A 2 f Q x k dx —> + 00 , as k^-oo. 

2. Otherwise, we look at the right end of the interval 0, and define 

a* := inf {x > 1 : fi(x) > f- 2 (x)} . 

Without loss of generality, we may assume a* < b*. To see this, note that if 
fi(b*) / .fiib*), then the continuity of f\ — f 2 implies f\ (x) / .fii'x) within a range 
( b* — e, b *] for some e > 0, and a* is the supremum of all these e. Otherwise, 
if f\ (x) = f' 2 (x) on an entire interval [b* — £,b*] for some e > 0, then /1 f> f 2 
on O (the opposite of the previous case) implies the existence of some f < b* so 
that fi(x) < f 2 (x), and a* is the supremum of all these f (see figure 1 for an 
illustration). In case that f = 0, we would have f± > f 2 on 0, which is either 
trivial (as A (k) = 0 for all k if /) = f^) or otherwise covered by the previous case. 

In either situation, we can fix a compact interval [a,b\ C ( a*,b*) C [1,6*] =0 
and two constants Ai,A 2 > 0 (which exist because /i ,/2 are bounded as being 
continuous on the compact set II), so that the function 


£(k,x) := 


—X\x k , if 1 < x < a; 
\ 2 X k , if a < x < b. 


lower-bounds the difference of densities in (8) (see figure 1), and 


rb* [-b 

A {k) = J x k (f\ — f 2 )(x)dx > J l{x,k)dx 


= — Ai J x k dx + A 2 
1 


a 


k + 1 


(Ai + A2) + A2 


x k dx 

b k +i 

k + 1 


■ +00, 
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Figure 1: Lower-bounding the difference of densities 


as k -» oo due to a < b and because Ai, A 2 are constants that depend only on fi, f 2 - 

In both cases, we conclude that, unless f\ = / 2 , A(fc) > 0 for sufficiently large 
k > K where K is finite. 

□ 


Theorem 2.5 Let 5 be the set of distributions that satisfy assumption 1.3. Assume 
the elements of 5 to be represented by hyperreal numbers in Ht°° /U, where U is any free 
ultrafilter. There exists a total ordering on the set $ that is independent ofU. 

Proof. Let F\,F 2 be two probability distributions, and let R\ ~ F\, R 2 ~ F 2 . Lemma 
2.4 assures the existence of some K E IN so that, w.l.o.g, we may define the ordering 
F\ F F 2 iff mR 1 (k) < mn 2 (k) whenever k > K. Let L be the set of indices where 
mR 1 (k) < tok 2 (£;), then complement set IN \ L is finite (it has at most K — 1 elements). 
Let U be an arbitrary ultrafilter. Since IN \ L is finite, it cannot be contained in U as 
U is free. And since U is an ultrafilter, it must contain the complement a set, unless it 
contains the set itself. Hence, L £U, and the claim follows. The converse case is treated 
analogously. □ 

Now, we can state our preference criterion on distributions on the quotient space 5 C 
M.°°/U, in which each probability distribution of interest is represented by its sequence of 
moments. Thanks to theorem 2.5, there is no need to construct the ultrafilter U in order 
to well-define best responses, since two distributions will compare in the same fashion 
under any admissible choice of U. 

Definition 2.6 (Preference Relation over Probability Distributions) LetR\,R 2 
be two random variables whose distributions F\,F 2 satisfy assumption 1.3. We prefer 
F\ over F 2 relative to an ultrafilter U, written as 

F\ F F 2 : 3I< £ IN s.t. Vfc > K : m.R 1 (k) < mn 2 (k) (9) 
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Strict preference of F\ over F2 is denoted as 

F\ -< F2 : 4=>- 3 K € IN s.t. Vfc > K : m,R 1 (k ) < niR 2 {k) 

Theorem 2.5 establishes this definition to be compatible with (in the sense of being 
a continuation of) the ordering on the hyperreals IR 00 /U, being defined as a < b iff 
{i : a,i < bi } 6 U , when a, b are represented by sequences (a^gM, (&j)jgN- 

By virtue of the ^-relation, we can define an equivalence = between two distributions 
in the canonical way as 


F\ = F2 : 4 =/- ( F\ ^ F2) A (F2 ^ F\). (10) 

Within the quotient space J C R°° /U, we thus consider two distributions as identical, 
if only a finite set of moments between them mismatch. Observe that this does not imply 
the identity of the distribution functions themselves, unless actually all moments match. 

The strict preference relation A induces an ordering topology T on (J, whose open sets 
are for any two distributions F \, F2 , 

(F ll F 2 ) :={FeZ:F 1 ^F^F 2 } f 

and the topology is denoted as T = {(T), F2)\F\ , F2 G 5 where F\ -< F2}. 

The Discrete Case: In situations where the game’s payoffs are better modeled by 
discrete random variables, say if a nominal scale (“low”, “medium”, “high”) or a scoring 
scheme is used to express revenue, assumption 1.3 is too strong in the sense of prescribing 
a continuous density where the model density is actually discrete. 

Assumption 1.3 covers discrete distributions that possess a density w.r.t. the counting 
measure. The line of arguments as used in the proof of Lemma 2.4 remains intact without 
change, except for the obvious difference that Q is a finite (and hence discrete) set now. 
Likewise, all conclusions drawn from lemma 2.4, including theorem 2.5, as well as the 
definitions of ordering and topology transfer without change. 

2.3 Comparing Discrete and Continuous Distributions 

The representation (7) of distributions by the sequence of their moments works even 
without assuming the density to be continuous. Therefore, it elegantly lets us compare 
distributions of mixed type, i.e., continuous vs. discrete distributions on a common 
basis. 

It follows that we can - without any changes to the framework - compare discrete 
to continuous distributions, or any two distributions of the same type in terms of the 
and =-relations. This comparison is, obviously, only meaningful if the respective 
random variables live in the same (metric) space. For example, it would be meaningless 
to compare ordinal to numeric data. 
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2.4 Comparing Deterministic to Random 

In certain occasions, the consequence of an action may result in perfectly foreseeable ef¬ 
fects, such as fines or similar. Such deterministic outcomes can be modeled as degenerate 
distributions (point- or Dirac-masses) 5 . These are singular and thus violate assumption 
1.3, since there is no density function associated with them, unless one is willing to resort 
to generalized functions; which we do not do in this report. Nevertheless, it is possible 
to work out the representation in terms of moment sequences. If X is a random variable 
that deterministically takes on the constant value a all the time, then the respective 
moment sequence has elements E(A' fc ) = E(a fc ) = a k for all k € IN Given another 
non-degenerate distribution with density function /, supported on fl = [0,6], we can 
lower- or upper-bound the moments of the respective random variable Y by exponential 
functions in k, which can straightforwardly A-, =- or -(-compared to the representative 
(a fc )fc £ ]n of the (deterministic) outcome a £ 1, Algorithmic details will follow in part 
two of this research report. 

2.5 Extensions: Relaxing Assumption 1.3 

Risk management is often required to handle or avoid extreme (catastrophic) events. 
The respective statistical models are distributions with so-called “heavy”, “long” or 
“fat” tails (exact definitions and distribution models will follow in part two of this re¬ 
port). Extreme-value distributions such as the Gumbel-distribution, or also the Cauchy- 
distribution (that is not an extreme value model) are two natural examples that fall into 
the class of distributions that assign unusually high likelihood to large outcomes (that 
may be considered as catastrophic consequences of an action). In any case, our assump¬ 
tion 1.3 rules out such distributions by requiring compact support. Even worse, the 
-(-relation based on the representation of a distribution by the sequence of its moments 
cannot be extended to cover distributions with heavy tails, as those typically do not 
have finite moments or moment-generating functions. Nevertheless, such distributions 
are important tools in risk management. 

Things are, however, not drastically restricted by assumption 1.3, for at least two 
reasons: First, compactness of the support is not necessary for all moments to exist, 
as the Gaussian distribution has moments of all orders and is supported on the entire 
real line (thus violating even two of the three conditions of assumption 1.3). Still, it 
is characterized entirely by its first two moments, and thus can easily be compared in 
terms of the -(-relation. 

Second, and more importantly, any distribution with infinite support can be approxi¬ 
mated by a truncated distribution. Given a random variable X with distribution function 
F , then the truncated distribution is the distribution of X conditional on X falling into 


5 Note that the canonic embedding of the reals within the hyperreals represents a number aelby the 
constant sequence (o,o,...). Picking up this idea would be critically flawed in our setting, as any 
such constant sequence would be preferred over any probability distribution (whose moment sequence 
diverges and thus overshoots a inevitably and ultimately). 
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a finite range, i.e., the truncated distribution function F gives the conditional likelihood 

F(x) = Pr(X < x\a < X < b). 

Provided that F has a density function /, the truncated density function is 

f( x ) = [ F(b)-F(a) > a < x < b; 

I 0, otherwise. 

In other words, we simply crop the density / outside the interval [a, 6] and re-scale the 
resulting function to become a probability distribution again. 

Since every distribution function F is non-decreasing and satisfies lim x _ 5 . 00 F(x) = 1, 
any choice of 5 > 0 admits a value b such that F{b) >1 — 5. Moreover, since our random 
variables are all non-negative, we have lim a ._> 0 + F(x) = hm a ,_ >0 + fo f(x)dx = 0, since 
F is right-continuous. It follows that the truncated distribution density for variables of 
interest in our setting simplifies to f(x) = f{x)/F(b). Now, let us compare a distribution 
F to its truncated version F in terms of the probabilities that we would compute: 

F{x) - F{x) | = J X f(t)dt - f f(t)/F(b)dt 

= l i m ( 1 ~m) dtl<e L mdt=e ’ 

'-V-' 

<£ 

for sufficiently large b, which depends on the chosen e > 0 that determines the quality 
of approximation. Conversely, can find always find a truncated distribution F that 
approximates F up to an arbitrary precision e > 0. This shows that restricting ourselves 
to distributions with compact support, i.e., adopting assumption 1.3, causes no more 
than a numerical error that can be made as small as we wish. 

More interestingly, we could attempt to play the same trick as before, and character¬ 
ize a distribution with fat, heavy or long tails by a sequence of approximations to it, 
arising from better and better precisions e —> 0. In that sense, we could hope to compare 
approximations rather than the true density in an attempt to extend the preference and 
equivalence relations ^ and = to distributions with fat, heavy or long tails. 

Unfortunately, such hope is wrong, as a distribution is not uniquely characterized 
by a general sequence of approximations (i.e., we cannot formulate an equivalent to 
lemma 2.3), and the outcome of a comparison of approximations is not invariant to 
how the approximations are chosen (i.e., there is also no alike for lemma 2.4). To see 
the latter, take the quantile function i ? “ 1 (o:) for a distribution F. and consider the 
tail quantiles F ( a ) = F^ x (l — a). Pick any sequence (a n ), woo with a n —>■ 0. Since 
liin x ^ oo F(x) = 1, the tail quantile sequence behaves like F (a n ) —> oo, where the limit 
is independent of the particular sequence (a n ) n _ >00 , but only the speed of divergence is 
different for distinct sequences. 

Now, let two distributions F \, U? with infinite support be given. Fix two sequences 
a n and u n , both vanishing as n—>• oo, and set 

a n ■= F 1 1 (a n ) < b n := F 2 1 (u n )- (11) 
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Let us approximate F\ by a sequences of truncated distributions /i ;n with supports 
[0,a n ] and let the sequence f- 2 , n approximate /2 on [0,6 n ]. Since a n < b n for all n, the 
proof of lemma 2.4 then implies that the approximations with support [0, a n ] is always 
strictly preferable to the distribution with support [0,6 n ], thus f\ - n -< f- 2 , n - However, 
by replacing the “<” by a “>” in (11), we can construct approximations to F\, F 2 
whose supports exceed one another in the reverse way, so that the approximations would 
always satisfy f\ , n y f 2 , n - It follows that the sequence of approximations cannot be used 
to unambiguously compare distributions with infinite support, unless we impose some 
constraints on the tails of the distributions and the approximations. The next lemma 
assumes this situation to simply not occur, which allows to give a sufficient condition 
to unambiguously extend strict preference in the way we wish. 


Lemma 2.7 Let F \, F 2 be two distributions supported on the entire nonnegative real 
half-line R + with continuous densities fi,f 2 - Let (a n )neTN be an arbitrary sequence with 
a n 00 as n —> 00 , and let fi iU for i = 1,2 be the truncated distribution f supported on 

[ 0 , On] . 

If there is a constant c < 1 and a value xq £ R such that fi(x) < c ■ f 2 (x) for 
all x > xq, then there is a number N such that all approximations /g n , ff.n satisfy 
fi,n -< f2,n whenever n > N. 


Proof. Throughout the proof, let i £ {1,2}. The truncated distribution density that 
approximates /,; is fi(x)/(Fi(a n ) — F*(0)), where [0, a n ] is the common support of n-th 
approximation to fi, f 2 - By construction, a n ^-^-oo as n—> 00 , and therefore Fi(a n ) — 
Fjf 0) -> 1 for i = 1, 2. Consequently, 


_ F 1 (a n ) - Fi(0) 
F2(a n ) — F 2 { 0) 


as n —> 00 , 


and there is an index N such that Q n > c for all n > N. In turn, 


f 2 (x) ■ Qn > f 2 (x) ■ c > fl(x), 


and by rearranging terms, 


h( x ) r h(x) n2 x 

Fi(a n )-Fi(0) ^ F 2 (a n )-F 2 (0)’ 1 j 

for all x > .To and all n > N. The last inequality (12) lets us compare the two approxi¬ 
mations easily by the same arguments as have been used in the proof of lemma 2.4, and 
the claim follows. □ 

By virtue of lemma 2.7, we can extend the strict preference relation to distributions 
that satisfy the hypothesis of the lemma but need not have compact support anymore. 
Precisely, we would strictly prefer one distribution over the other, if all truncated ap¬ 
proximations are ultimately preferable over one another. 
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Definition 2.8 (Extended Preference Relation -<) LetF\,F 2 be distribution func¬ 
tions of nonnegative random variables that have infinite support and continuous density 
functions fi,f 2 - We (strictly) prefer F\ over F 2 , denoted as F\ -< F 2 , if for every se¬ 
quence a n —> 00 there is an index N so that the approximations Fi n for i = 1,2 satisfy 
F\ t n -< F 2 ,n whenever n > N. 

The y-relation is defined alike, i.e., the ultimate preference of F 2 over F\ on any 
sequence of approximations. 

Definition 2.8 is motivated by the above arguments on comparability on common 
supports, and lemma 2.7 provides us with a handy criterion to decide the extended 
strict preference relation. 

Example 2.9 It is a simple matter to verify that any two out of the three kinds of 
extreme value distributions (Gumbel, Frechet, Weibidl) satisfy the above condition, thus 
are strictly preferable over one another, depending on their particular parametrization. 


Definition 2.8 can, however, not applied to every pair of distributions, as the following 
example shows. 


Example 2.10 Take the “Poisson-like” distributions with parameter A > 0, 


fi(k) oc 


\k/2 _ x 

(k/ 9 )\ e > when k is even; 

0, otherwise. 


f 2 {k ) oc 


0, when k is even ; 

(ffc-i)/ 2 )! e ~ A ’ otherwise 


It is a simple matter to verify that no constant c < 1 can ever make f\ < c ■ f 2 and 
that all moments exist. However, neither distribution is preferable over the other, since 
finite approximations based on the sequence a n := n will yield alternatingly preferable 
approximations. 


An occasionally simpler condition that implies the hypothesis of definition 2.8 is 


lim 

X —> OO 


fi(x) 

fl(x ) 


= 0 . 


(13) 


The reason is simple: if the condition of definition 2.8 were violated, then there is an in¬ 
finite sequence (x n )neiN for which f\{x n ) > c- f 2 {x n ) for all c < 1. In that case, there is a 
subsequence (x n Jfce^ for which lim^oo fi(x nk )/f 2 (x nk ) > c. Letting c-)-1, we can con¬ 
struct a further subsequence of (.T ni ,)fcg]N to exhibit that limsup n _ s . 00 (/i(x n )// 2 (x n )) = 
1, so that condition (13) would be refuted. 


Remark 2.11 It must be emphasized that the above line of arguments does not provide 
us with a mean to extend the A- or =-relations accordingly. For example, an attempt to 
define A and = as above is obviously doomed to failure, as asking for two densities f\, /2 
to satisfy fi(x) < c\ ■ f 2 (x) ultimately (note the intentional relaxation of < towards <), 
and f 2 (x) < C 2 ■ fi(x) ultimately for two constants ci,C 2 < 1 is nonsense. 
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A straightforward extension of A can be derived from (based on) the conclusion of 
lemma 2.7: 

Definition 2.12 Let Tj, F2 be two distributions supported on the entire nonnegative 
real half-line R + with continuous densities f\, f'2 ■ Let (a n ) n6 jn be a diverging sequence 
towards 00 , and let Fi )Tl for i = 1,2 denote the density Fi truncated to have support 
[0,a n ]. We define F\ A F 2 if and only if for every sequence (a n ) ng ]N there is some index 
N so that Fi :n A F 2 , n for every n > N. 

More compactly and informally spoken, definition 2.12 demands preference on all 
approximations with finite support except for at most finitely many exceptions near the 
origin. 

Obviously, preference among distributions with finite support implies the extended 
preference relation to hold in exactly the same way (since the sequence of approxima¬ 
tions will ultimately become constant when a n overshoots the bound of the support), so 
definition 2.12 extends the ^-relation in this sense. This observation justifies our choice 
of definition 2.12 as a valid extension of A from distributions with compact support to 
those with infinite support. 

Unfortunately, the extended ^-relation is not as easy to decide as (extended) strict 
preference and usually calls for computing the moment sequences analytically to be able 
to compare them in the long run. 

Nevertheless, example 2.9 substantiates the expectation that practically relevant dis¬ 
tributions over R + may indeed compare w.r.t. -< or >- by definition 2.8 or criterion (13). 
While it is easy to exhibit distributions with infinite support that -(-compare in the 
sense of definition 2.12, their practical relevance or even occurrence is not guaranteed. 
Example 2.13, however, shows that preference relations are indeed non-empty if they are 
defined like (9) for distributions with infinite support. 

Example 2.13 Let X F\ be a Gaussian random variable. It is well-known that any 
Gaussian distribution has finite moments of all orders, so let us call the resulting se¬ 
quence (a n ) ne in. Furthermore, let us construct another sequence (6 n )ne in being identical 
to (a„) ng ] n, except for finitely many indices I = {?'i, * 2 , • • •, ik} for which we choose 
bj < aj whenever j € I and bj := aj otherwise. It is easy to see that the existence of 
the moment-generating function /ax for Fi implies the existence of a moment-generating 
function py for a random variable Y ~ F 2 that has moment sequence (6 n )neiN (since the 
power-series px dominates the series py). However, it is equally clear that px / Py ■ 
Thus, F\ / F 2 although F\ = F 2 , since the mismatch is only on finitely many indices, 
and the complement set of these must be in the ultrafilter. 

2.6 Interpretation and Implications of Preferences 

Having defined preferences among probability distributions, we now look at what F\ A 
F 2 actually means. A simple first impression is gained by considering cases in which the 
first few moments match. For that sake, let F\ , F2 be two distributions for which no 
preference has been determined so far. 
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• If n\ = E(i^i) < fj ,2 = E(F 2 ), then we prefer the distribution with smaller mean. 
That is, decisions that yield to less average risk would be ^-preferred. 

• If the means are equal, then we prefer whatever distribution has smaller second 
moment (by virtue of Steiner’s theorem). In other words, the preferred among 
F\ , F '2 would be the one with smaller variance, or otherwise said, the distribution 
whose outcome is “more predictable” in the sense of varying less around its mean. 

• Upon equal mean and variance, the third moment would make the difference. This 
moment is the skewness , and we would prefer the distribution for which E(X 3 ) 
is smaller, i.e., the distribution that “leans more to the left”. The respective 
distribution would assign more likelihood to smaller outcomes, thus giving less 
risk. 

We refrain here from extending the above intuitions to cover cases when kurtosis tips 
the scale, as the physical meaning of this quantity is debatable and no consensus among 
statisticians exists so far. Instead, we give the following result that makes the above 
intuitions more explicit in the sense of saying that: 

If F\ ^ F 2 , then “extreme events” are less likely to occur under F\ than 
under F 2 . 

The rigorous version of this, which especially clarifies the adjective “extreme”, is the 
following theorem: 

Theorem 2.14 Let X 1 ~ T\,X 2 ~ F 2 , where Fi, F 2 satisfy assumption 1.3. If F\ ■< F 2 , 
then there exists a threshold xq G supp(F\) U supp(F 2 ) so that for every x > xq, we have 
Pr(Xi > x) < Pr(X 2 > x). 

Proof. Let / 1 , / 2 be the density functions of Fi, F 2 . Call Q = supp(Tj) Usupp(F 2 ) = [0, a] 
the common support of both densities, and take f = inf {x G ft : f\ (x) = / 2 (x) = 0}. 
Suppose there were an e > 0 so that /1 > / 2 on every interval [£ — 6 , £] whenever 5 < e, 
i.e., /1 would be larger than / 2 until both densities vanish (notice that /1 = / 2 = 0 
on the right of £). Then the proof of lemma 2.4 delivers the argument by which we 
would find a I\ G IN so that P(Xf) > E(X|) for every k > K, which would contradict 
Fi F F 2 . Therefore, there must be a neighborhood [£ — 5, £] on which fi(x) < / 2 (x) 
for all x G [£ — <5, £]. The claim follows immediately by setting xo = £ — 5, since 
taking x > xq, we end up with f\ (t)dt < f 2 {t)dt, and for i = 1,2 we have 
fx fi(t)dt = /“ fi(t)dt = Pr(Xj > x). □ 

Observe that this is compatible with the common goals of statistical risk management 
[7] in other sectors, such as financial business: the preference-relation ■< compares the 
tails of distributions, and optimization w.r.t. ■< seeks to “push” the mass assigned by a 
distribution towards lower damages. Essentially, we thus focus on large deviations (dam¬ 
ages), which intuitively makes sense, as small deviations from the expected behavior may 
(most likely) be taken by the system’s (designed) natural resilience against distortions. 
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We close this section by giving a few examples showing graphically how different 
distributions compare against each other. 


Example 2.15 (different mean, same variance) Consider two Gumbel-distributions 
X ~ F\ = Gumbel (31.0063,1.74346) and Y ~ F 2 = Gumbel( 32.0063,1.74346), where a 
density for Gumbel (a, b) is given by 

f{x\a, b) = 6 , 

where aSl and b > 0 are the location and scale parameter. 



Figure 2: Comparing distributions with different means 

Computations reveal that under the given parameters, the means are E(A) = 30, E(E) = 

31 and Var (A") = Var(E) = 5. Figure 2 plots the respective densities of F\ (dashed) 
and F2 (solid line). The respective moment sequences evaluate to 

E(A fc ) = (30, 905, 27437.3, 835606, 2.55545 x 10 7 ,...), 

E(Y fc ) = (31,966,30243.3,950906,3.00162 x 10 7 ,...), 

thus showing that F\ A F2. This is consistent with the intuition that the preferred 
distribution gives less expected damage. 

Example 2.16 (same mean, different variance) Let us now consider two Gumbel- 
distributions X ~ F\ = Gumbel( 6 . 27294, 2.20532) andY ~ F2 = GumbeKfb. 19073, 2.06288), 
for which E(A) = E(F) = 5 but Var (A) = 8 > Var (Y) = 7. 

Figure 3 plots the respective densities of F\ (dashed) and F2 (solid line). The respective 
moment sequences evaluate to 

E(A fc ) = (5, 33, 219.215,1654.9,11957.8,...), 

E(V fe ) = (5, 32, 208.895,1517.51,10806.8,...), 

thus showing that F2 A F\. This is consistent with the intuition that among two actions 
leading to the same expected loss, the preferred one woidd be one for which the variation 
around the mean is smaller; thus the loss prediction is “more stable”. 
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Figure 3: Comparing distributions with equal means but different variance 


Example 2.17 (different distributions, same mean and variance) Let us now con¬ 
sider a situation in which the expected loss (first moment) and variation around the mean 
(second moment) are equal, but the distributions are different in terms of their shape. 
Specifically, let X ~ F\ = Gamma(260.345, 0.0373929) and Y ~ Weihull{ 20,10), with 
densities as follows: 


fGammafi ^^) — 


r(a) 

0 , 


x > 0; 
otherwise 


f Weibull(x\a, b^) — 



x > 0; 
otherwise 



Figure 4: Comparing distributions with matching first two moments but different shapes 

Figure 4 plots the respective densities of F\ (dashed) and F 2 (solid line). The respective 
moment sequences evaluate to 

= (9.73504,95.1351, 933.259, 9190.01, 90839.7,...), 

E(V fc ) = (9.73504,95.1351, 933.041, 9181.69, 90640.2,...), 
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thus showing that A F\. In this case, going with the distribution that visually ‘“leans 
more towards lower damages” would be flawed, since F\ nonetheless assigns larger like¬ 
lihood to larger damages. The moment sequence, on the contrary, unambiguously points 
out F 2 as the preferred distribution. This illustrates Theorem 2. If. 

3 Games with Uncertain Payoffs 

Given a total ordering A on the set of actions as defined in section 2, we can go on lifting 
the remaining concepts and results of game theory to our new setting. In particular, 
we will have to investigate zero-sum competitions and Nash-equilibria in games whose 
payoffs are probability distributions. Before, however, it pays to look at arithmetics in 
our chosen subset of hyperreal numbers that represent our payoff distributions. It turns 
out that things cannot be straightforwardly be carried over, as we will illustrate in the 
next section. 

3.1 Arithmetic in £ C R °°/U 

The space R 00 /U is elsewhere known as the set of hyperreal numbers. Together with 
the ordering relation defined in the same way as (9), and because U is an ultrafilter, ^ 
is actually a field, and in many ways behaves like the real numbers. This is due to the 
ultrafilter acts in much the same way as a maximal ideal, when the quotient structure is 
formed. For example, we can soundly define min- and max-operators on 5- Furthermore, 
we can add and subtract elements from (5 in the canonical way by applying the respective 
operation pairwise on the sequences’ elements. Likewise, we can define an absolute-value 
function |x| = (|xi|)ie]N on the sequences, which naturally satisfies the triangle inequality 
because the sequence’s elements are from R. However, we stress that the absolute value 
does not induce a metric on £ (even though it satisfies the necessary conditions), as the 
absolute value under this definition is not real-valued. This is one difference to the field 
R. 

A more important difference is the observation that any probability distribution sat¬ 
isfying assumption 1.3 can be represented by an element in 5; but the converse is not 
true! For instance, given F £ 5 as a representative of some probability distribution, the 
element (— F) as being the sequence of moments of F, only with negative signs, does not 
represent a distribution (in general, and specifically under assumption 1.3). Neither is 
the sum of two moments necessarily the moment of some other probability distribution. 
Finally, observe that the zero element 0 = (0, 0,...) does not define a proper probability 
distribution. Hence, the concept of a “zero-sum game” must be replaced by the (strate¬ 
gically equivalent) concept of a constant-sum game, to properly define things. This issue 
will not be of any particular importance in the following. 

Our proofs will nevertheless heavily rely on the existence of a well-defined ordering 
and arithmetic on the subset 5 of the hyperreals. The fact that in lack of an explicit 
representation of U we cannot do arbitrary arithmetic somewhat limits the candidate 
algorithms to analyze the games and compute equilibria and security strategies, however, 
this limitation is not severe and can be overcome in our context of application. 
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3.2 Continuity of F(p,q)(r) in (p,q) 

The existence of Nash-equilibria crucially hinges on the continuity of payoff functionals 
(in the classical setting). The existence of a topology on the hyperreal set that we 
consider lets us soundly define continuity in terms of the topology, but proving our 
payoff distribution function (3) to be continuous is so far an open issue, and this gap 
shall be closed now. 

To establish continuity of the distribution-valued utility function u(p,q) := F(p,q) 
we have to show that any set in the topology T, i.e., any open set in 3) has a preimage 
under u that is open in S± x S 2 w.r.t. the product topology. The following lemma 
establishes the important steps towards this conclusion by exploiting the ordering and 
arithmetic within 3- Hereafter, we consider (3 C R°°/7Y, T, -<) as an ordered topological 
space induced by an arbitrary ultrafilter U. 

Lemma 3.1 Let n,...,rk G 3 for k > 1 be a set of fixed elements, and take a = 
(aq, ... ,ctk) G R fc . If two elements f,a £ 3 bound the weighted sum i -< Yflt= 1 a i r i = 
cx T r -< u, then there is some strictly positive 6 G R so that i -< a T r -< u for every a 
within a 5-neighborhood of a in R fc . 

Proof. Define A := min { ai T ru — a^r } >- 0 and r := rnaxjri,..., r^}. Suppose 
that we would modify all weights on to on+5i = on. If so, then the so-modified sum differs 
from the given one by | a T r — a T r\ < I b, \ r t < r ■ 1 1^1- N° w ) suppose that all 

|<Sj| < 5 , then the change alters a T r by a magnitude of no more than r • 1 — r-k-5. 

As k and r are fixed, we can choose 5 sufficiently small to satisfy r ■ k ■ 5 -< A, in which 
case we must have a T r — a T r | < A, and therefore £ -< a T r -< u for any choice of a 
within an ^-neighborhood of a in the maximum-norm on R fc . □ 

By virtue of lemma 3.1, continuity of F(p, q) is easily implied by the continuity of the 
weights C p , q (i,j) in (p,q). 

Proposition 3.2 Let i,j be integers and define the function Dij : ,Sf x S'2^R. as 
Dij(p,q) = Cp,q(i,j) = P r Pj q(i,j). If is continuous and all Fij satisfy assump¬ 
tion 1.3, then the mapping F : 5i x S2^3i (p, q) i->- Xa j C P) q(i, j)Fij is continuous 
w.r.t. the product topology on S 1 x S 2 and the order topology on 3- 

Proof. Without a metric on 3i we need to show that the preimage of every open set 
in 3 under F is open to prove that F is continuous. For that sake, let the open set 
(£, u ) G T be arbitrary and contain some point F(p, q) (which must exist, for otherwise, 
the set of preimages would be empty). To ease notation, let us flatten the double-sum 
into an ordinary sum (say, by introducing a multiindex v) over k = n-m elements, 
where n, m are the limits in the original expression. Then, the mapping takes the form 
F(p, q) = Ylt=i F>v(p, q)F v . With the weights a being defined by the individual values 
of D u (p, q) = C p . q (n), we can apply lemma 3.1 to establish a bound <5 > 0 within which 
we can arbitrarily alter the weights towards a without leaving the open set (l, u ). Since 
C is continuous on compact Si x S 2 it is also uniformly continuous, and we can fix a 
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S' > 0 so that \\D u (p',q') — D u (p,q)\\ < S whenever || (p,q) — {p' ,q')\\ < S', indepen¬ 
dently of the particular point (p, q). The sought pre-image of the open set (£, u) is thus 
the (infinite) union of open neighborhoods constructed in the way described, and thus 
itself open. □ 


3.3 Security Strategies and Zero-Sum Games 

Given the continuity and ordering of payoffs in games that reward players with random 
variables, our next step is the definition of zero-sum games in this context. We rephrase 
the standard definition of a zero-sum equilibrium using the preference relation A in the 
straightforward fashion, by defining a two-person game To = ({1, 2} , {Si, S2} , {F, —F}) 
as usual, but keeping the following in mind: 

• When (ai)F =1 defines the probability distribution F, then — F is defined by (— 

but not necessarily defines a valid probability distribution any more. To see this, 
simply recall that the Taylor-series (6) would upon all negative moments define a 
negative-valued function Hx(s) < 0, which cannot be a moment-generating func¬ 
tion, since px(s) = E(e sX ) > 0 in any case. 

• The sum F + (— F) being computed in 1R 00 /U is defined as the sequence that is 
constantly zero. Again, this does not define a probability distribution in the proper 
sense. However, strategic equivalence (as in the classical theory of games) tells the 
set of equilibria does not change if the payoffs of both players get a constant value 
added to them (in that case, the payoffs on either side are changed by the same 
value, leaving all inequalities intact). By the same token, we may think of constant- 
sum games, which avoid degenerate cases as above (where two distributions add 
up to something that is no longer a distribution). 

The familiar equilibrium condition in a two-player zero-sum game T can be rephrased 
as follows: a strategy profile ( p*,q *) £ Si x S 2 is a (Nash-)equilibrium, if for every 

(p,q) £ -Si x S 2 , 

F(p,q*)FF(p*,q*)FF(p*,q), (14) 

i.e., any deviation from the optimal profile (p*, q*) would worsen the situation of either 
player (in either a zero- or constant-sum competition). 

Before security strategies can be defined properly, we need to assure existence of 
equilibria profiles in our modified setting. In this regard, Glicksberg’s theorem, which 
generalizes Nash’s original theorem, helps out: 

Theorem 3.3 (see [4] and [3, Theorem 1.3]) Consider a strategic-form game whose 
strategy spaces Si are nonempty compact subsets of a metric space. If the payoff func¬ 
tions are continuous w.r.t. the metric, then there exists a Nash-equilibrium in mixed 
strategies. 

It is a simple matter to verify that 
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• both sets PSi,PS '2 are finite subsets of R (or H d for d > 1) and hence compact 
w.r.t. all norms on this Euclidian space, and 

• the payoff functions are continuous by lemma 3.2, 

so that a Nash-equilibrium in the sense of (14) exists by theorem 3.3. 

Furthermore, any saddle-point satisfying (14) defines the same payoff distribution in 
the sense of possibly defining a different representative but in any case pinning down 
the same equivalence class of distributions in 1R 00 / =, where = is defined by (10). The 
proof is a restatement of Theorem 3.12 in [11], 

Lemma 3.4 Let a continuous function F : PS\ x PS 2 ~f$ be given, where PS\ C 
W' 1 ,PS 2 C IR" 2 . Furthermore, let ( p',q') and ( p*,q*) be two different saddle-points . 
Then, ( p*,q') and ( p',q *) are also saddle-points, and 

F(p\ q') = F(p*, q*) = F(p *, q') = F(p /, q*). 

Proof. The proof is by direct checking of the saddle-point condition, i.e., 

F{p\q’) F F(p',q’) F F(p',q *) F F(p*,q*) F F(p*,q') 

=>F(p ', q') = F(p*, q*) = F(p /, q*) = F(p*,q'). 

( p',q *) is saddle-point, 

F(p, q*) f F(p*, q*) = F(p', q*) = F(p', q') F F(p', q ). 

The fact that (p *, q') is a saddle-point is proved analogously. □ 

Lemma 3.4 permits calling v = F(p *, q*) the saddle-point value of the zero-sum game 
IV With this, we are ready to step forward towards defining security strategies. 

For security strategies in the general case of two-person games with arbitrary payoffs, 
let us denote the general game by T = ({1, 2} , {Si, S2} , {F, G}), in which player 1 has 
payoff structure F , and player 2 has payoff structure G. Let To denote the associated 
zero-sum competition that - adopting a worst-case assumption - substitutes an unknown 
G by (-F), i.e., T 0 = ({1, 2} , {V, S 2 } , [F, —F}). 

Theorem 3.5 Let T be an arbitrary two-person game, and let Tq be its associated zero- 
sum competition with equilibrium profile ( p*,q*)■ Then, for every ( p,q) € Si X S 2 , we 
have 

vFF(p,q), (15) 

and the strategy q* achieves equality in (15). 

Proof. Observe that the payoff F(p, q) is the same for player 1 in both games T and 
To- So, if player 1 follows an equilibrium profile ( p*,q *) of To, then the saddle-point 
condition (14) yields 

v = F(p*,q*)FF(p*,q), (16) 
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for every q , with equality being achieved by q*, obviously. Since player 2 will play to the 
best of its own benefit in T, call the equilibrium profile in F (which exists by theorem 
3.3) (p, q). In T, however, player 1 deviates by playing p* thus increasing the payoff for 
player 2. Thus, we can continue inequality (16) on the right side towards 

F(p*,q) d F(p,q). (17) 

The theorem is now immediate from expressions (16) and (17). □ 


4 Optimizing Multiple Security Goals 

In case of multiple goals to be defended, we turn the two conclusions of theorem 3.5 for 
scalar-valued games into two axioms on vector-valued games. This leads to the following 
definition from [9]: 

Definition 4.1 (Multi-Goal Security Strategy with Assurance) 

A strategy p* G S\ in a two-person multi-criteria game with continuous payoff u\ : 
Si x S2 —> 3 d for the service provider (player 1), is called a Multi-Goal Security Strategy 
with Assurance (MGSS) with assurance v = (Vi,. .., Vf) G 3 d if two criteria are met: 

Axiom 1: Assurance The values in v are the component-wise guaranteed payoff for 
player 1, i.e. for all components i, we have 

Vi ^ uf'fp *, q) Vq G S 2 , (18) 

with equality being achieved by at least one choice qi G S2. 

Axiom 2: Efficiency At least one assurance becomes void if player 1 deviates from p* 
by playing p / p*. In that case, some q p G S 2 exists (that depends on p) such that 

ui{p,q p )<iv. (19) 

4.1 Characterization and Existence of Security Strategies 

The existence of MGSS in the sense of definition 4.1 hinges on a few basic facts about 
continuous real-valued functions. Fortunately, it turns out that the only ingredient 
needed is uniform continuity of payoffs on compact strategy spaces. The precise fact 
used to establish the existence of multi-criteria security strategies is the following [9]: 

Let v,\ : PS\ x PS 2 —> IR d be player l’s payoff function, and let it be continu¬ 
ous. Since PS\ x PS 2 is compact, given any e > 0, we can find a 6 > 0 such 
that ||tti(x, y) - ui(x', y')lloo < e ’ whenever \\x - yW^ < 5. 

This argument can be transferred easily to our setting, by a simple inspection of the 
proofs of lemma 3.1 and proposition 3.2. 


30 



Proposition 3.2 tells that F : S\ x S 2 —>■ $ is continuous w.r.t. the topologies on 
IR.I PSl I'I PS ‘ 2 and the ordering topology on 5- So, let (—e,+e) for 0 -< e G 3 be an open 
interval, then we can find some real S > 0 such that whenever | (p. q) — ( p' ></)lloo < 
we have — e -< F(p,q) — F(p',q') -< e by construction of 5 (see the proofs of lemma 
3.1 and proposition 3.2). More importantly, the 5 is constructed only from e but is 
independent of ( p,q ). Hence, F is indeed uniformly continuous^. For vector-valued 
payoffs F : (p,q) e->- (F^(p,q ),..., F^ d \p,q)), uniform continuity is inherited in the 
canonical way. 

Furthermore, we need a proper replacement for the oo-norm on M d . which will work 
on elements x G 3 rd . This replacement is [[scfloo = [[(xi,..., Xd)]]oo := max{|xi | ,..., |x<j|} 
for (xi,..., Xd) G 5 d , which “resembles” the oo-norm on the real space. The slight 
difference in the notation shall highlight the fact that [[-Joe is technically not a norm, as 
it maps onto elements of 3 rather than real numbers. 

Lemma 4.2 is proved here for the sake of rigor, but is the only part from [9] that 
requires a reconsideration. The main result needed here will be theorem 4.4, whose 
proof will then rest on our version of lemma 4.2. 

Lemma 4.2 Let T be a multi-criteria game, and let p* be a multi-goal security provi¬ 
sioning strategy with assurance v, assuming that one exists. Then, no vector v -< v is 
an assurance for p*. 

Proof. Let v ~< v, put e := min!<j<jfc {vi — Vi} and observe that e >- 0. Since F is 
uniformly continuous, a 6 >- 0 exists for which || (p, q) — (p 1 , qOlloo ^ £ implies \F(p, q) — 
F{p',q ')loo -< I- 

Consider the mapping u q : S\ —> !R, fc , u q (p) := F(p,q), which is as well uniformly 
continuous on Si by the same argument. So, ||(p*,qr) — (p / ,9)|| 0O = || p* — p' 11 00 x 5 
implies \u q {p*) - u q (jp ')Joo = maxi<»< fc \F^\p*, q) - F^(p', q)\ -4 § Vq G S 2 . It 
follows that | F( l \p*,q) — F^(p',q )| -< | for * = 1,,.,, k and all q G S 2 , and conse¬ 
quently max 9e g 2 | F^(p*,q) -F^(p',q)\ -< |. Now, selecting any p' p* within an 
^-neighborhood of p *, we end up asserting F^\p',q) F F^ l \p*,q) — | for every i and 

q g S 2 . 

Using F^\p*,q) F Vi, we can continue by saying that F^ l \p',q) y Vi - | Vi - e. 
By definition of e, we have Vi — Vi F e, so that F^\p',q) >- v t for all i, contradicting 
(19) if v were to be a valid assurance vector. □ 

To compute MGSS, we apply a simple trick: we cast the two-person game in which 
player 1 pursuits d goals into a {d + l)-person game in which player 1 defends himself 
against d adversaries, each of which refers to a single security goal. The scenario is a 
“one-against-all” game, for which numerical solution techniques (e.g., fictitious play) are 
known. This is subject of upcoming companion work. 

6 The definition on topological spaces (without relying on a metric) is the following: a function / : X —t Y 
is uniformly continuous, if for any neighborhood B of zero in Y, there is a neighborhood A of zero in 
X so that x—y £ A implies /(*) — f(y) £ B. This definition is satsified by our “distribution-valued” 
function F : Si x S 2 —> $■ 
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Definition 4.3 (Auxiliary Game) Let a multiobjective game 


T = ({1,2},{S 1 ,S 2 },{F 1 ,F 2 }) 

be given, where player 1 receives d > 1 outcomes through the (known) payoff F\ = 
(F^\ ..., F -j^). Assume F 2 to be unknown. We define the (d + 1)-player midtiobjective 
game F = ( N , S, H) as follows: 

• N := {0,1,..., d}, is the player set, 

• S := {Si, S 2 ,..., S 2 } is the strategy multiset containing d copies of S 2 (one for 
each opponent in N), 

• the payoffs are 

— vector-valued for player 0, who gets 

F 0 (s 0 , ...,s d ):= (F^iso, si),..., F^ d \s 0 , s d )), 

— scalar for all opponents i = 1,2,... ,d, receiving 

Ffiso, ...,s d ):= -F^\.s 0 , Si ). 

The game T is called the auxiliary game for T. 

Theorem 4.4 Let T be a two-player multi-objective game with d> 1 distribution-valued 
payoffs. The situation p* constitutes a network provisioning strategy with assurance v 
for player 1 in the game T, if and only if it is a Pareto-Nash equilibrium strategy for 
player 0 in the auxiliary (d + 1)-player game T according to definition f.3. 

Proof (Sketch). The proof from [9] transfers with obvious changes to our setting, except 
for the above version of Lemma 4.2 being used in the last step. □ 

Theorem 4.4 equates the set of multi-goal security strategies to the set of Pareto-Nash 
equilibria in a conventional game. Existence of such equilibria is assured by the following 
theorem: 

Theorem 4.5 ([6]) Let T = ({1,... ,p) , {Si,..., S p } , {Fi,..., F p }) be a p-player mul¬ 
tiobjective game, where S\,..., S p are convex compact sets and F\,... ,F p represent 
vector-valued continuous payoff functions (where payoff for player i is composed from 
r'i > 1 values). Moreover, let us assume that for every i E {1,2,... ,p} each component 
F^ k \si, s 2 ,..., Si_i, Si, Sj+i,... ,s p ),k E {1,2,..., n}, of the vector function Fi repre¬ 
sents a concave function w.r.t. Si on Si for fixed si,...,Sj_i, Sj+i,...,s p . Then the 
multiobjective game T has a Pareto-Nash equilibrium. 
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It is almost straightforward to apply theorem 4.5, since almost all conditions have been 
verified already: we have p = 2 players, whose (vector-valued) payoffs are continuous 
by proposition 3.2, transferred canonically to the vector-valued case (which means that 
player 0 in the auxiliary game has tq = d payoffs, and every opponent i = l,... ,d has 
ri = 1 payoff). Likewise, the action spaces PS\,PS 2 that we consider are finite subsets 
of H, and hence the simplex of discrete probability distributions Si, S 2 are compact and 
convex sets. However, it remains generally open whether or not the payoff functions are 
concave. Under independent choices of actions - cf. section 2.1 — this is assured, and 
theorem 4.5 applies. However, if the actions are chosen interdependently, i.e., we have 
a nontrivial copula modeling the interplay, concavity of the payoffs must be determined 
upon the explicit structure of (4). 

4.2 Relation to Bayesian Decisions 

To embed the minimax-like decision finding that we described in a Bayesian frame¬ 
work, recall that a Bayesian decision is one that is optimal w.r.t. the a-posteriori 
loss-distribution that incorporates all information. Informally spoken, such decisions 
naturally give rise to minimax-decisions, if the loss-distribution is the least favourable 
one. Our minimax approach, on the contrary, has the opponent player 2 look exactly 
for this least favourable distribution, and the zero-sum assumption then implies that 
a multi-goal security strategy in the sense of 4.1 can be viewed as a Bayesian decision 
w.r.t. the Pareto-Nash opponent-strategy in the “zero-sum” auxiliary game associated 
with our multi-criteria competition (cf. theorem 4.5). 

A full fledged treatment of Bayesian decision theory can be found in [10]. We abstain 
from transferring this framework to our setting, as there appears to be no immediate 
benefit in doing so, due of the inherent lack of information that risk management here. 
In other words, while Bayesian decisions heavily rely on data, such data is not usually 
available in the context of security and defenses against unknown attackers. Attacks like 
eavesdropping are intrinsically unobservable (in most practically relevant cases), and the 
consequences may be observed delayed and under fuzzyness. 

Summing up, minimax decision as we compute them here may indeed be pessimistic 
relative to a “more informed” Bayesian decision. Under the expected lack of information 
that risk management often suffers from, however, it is nevertheless the best that we can 
do (theoretically). 

5 Compiling Quantitative Risk Measures 

The outcome of the game-theoretic analysis is in any case two-fold, consisting of: 

• An optimal choice rule p* over the set of actions PS±, and 

• An lower-bound distribution V* (or vector v if we optimize multiple goals as in 
section 4) for the random payoff that can be obtained from the game. This payoff 
is optimal in the sense of not being improvable without risking the existence of 
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an attack strategy that causes greater damage than predicted by V*. This bound 
is valid if and only if actions are drawn at random from PS\ according to the 
distribution p*. 

While the optimal action choice distribution p* is easy to interpret, compiling crisp 
risk figures from the payoff value V* (or a vector v thereof) requires some more thoughts. 
A common approach to risk quantisation is by the well-known “rule-of-thumb” 

risk = (incident likelihood) x (damage caused by the incident), ( 20 ) 

The beauty of this formula lies in its compatibility with any nominal or numerical scale 
of likelihoods and damages, while at the same time, it enjoys a rigorous mathematical 
fundament, part of which is game theory. 

Indeed, formula (20) is essentially the expected value of the loss-distribution that is 
specified by the damage potential of all known incidents, together with their likelihoods. 
The distribution V* that we obtain from our analysis of games with distribution-valued 
payoffs is much more general and thus informative: let v be the optimal distribution, 
then: 


• Formula (20) is merely the first moment of V*, i.e., 

risk = likelihood x damage = E (R ), when R ~ V* 

where the last quantity is equal to E(V*) = E(i? 1 ) ( p*,q *) that can be computed 
from equation (5). The missing value q* is here exactly the optimal strategy for the 
attacker in the hypothetical zero-sum competition that is set up to compute the 
sought security strategy p*. In other words, the value q* is a natural by-product 
of the computation of p* and delivered together with it. 

• Beyond the crisp result that formula (20) delivers, the distribution V can be an¬ 
alyzed for higher moments too, such as variance of the damage, or quantiles that 
would provide us with probabilistic risk bounds: for example, computing the 5%- 
and 95%-quantiles of V gives two bounds within the damage will range with a 
90% likelihood. This may be another interesting figure for decision support, which 
cannot be obtained on the classical way via formula (20). 

If the results refer to a MGSS, then the above reasoning holds for every component of the 
assurance vector v = (Vj*, ... ,Vf). That is, risk figures can be computed independently 
for every aspect of interest. 

Remark 5.1 The entries in the optimal attack strategy q* are an optimal choice rule 
over the set of attacker’s actions PS 2 ■ As such, they can be taken as indicators to 
neuralgic spots in the infrastructure. However, it must be emphasized that equilibria, 
and hence also security strategies, are notoriously non-unique. Therefore, the indication 
by q* is only one among many other possible ones, and thus must not be used isolated 
from or as a substitute for other/further information and expertise. 
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Remark 5.2 As an alternative quantity of interest, one may ask for the expected maxi¬ 
mum repair costs over a duration of (unchanging) infrastructure provisioning p and risk 
situation q. In adopting such an approach, we can put M n := max{i?i,..., R n }, but 
then find 

Pr (M n < r) = Pr(i?i < r, R 2 < r,..., R n < r) 

= Pr(Ri < r) Pr(R 2 < r) ■ ■ ■ Pr (R n < r) = [( F(p , q))(r)} n , 

if the repairs induce independent costs. Since ( F(p,q))(r) < 1 for all r by definition of 
a distribution function, we end up concluding that the long-run maximum is either zero 
or one, as [(F(p, g))(r)] n —> 0 if (F(p,q))(r) < 1, or remains F(jp,q)(r ) = 1 otherwise. 

So the maximum is not as informative as we may hope under the assumptions made. 
Nevertheless, modeling maxima is indeed the proper way to control risk, and theorem 
2.If fits our A -relation and framework quite well into these classical line of approaches. 

6 Outlook 

So far, various practical issues have been left untouched, which will be covered in com¬ 
panion work to this report. In particular, future discussions will include: 

• Methods and models to capture extreme events (distributions commonly used in 
quantitative risk management) 

• Methods and algorithms to compile payoff distributions from simulation or empir¬ 
ical data 

• Algorithms to efficiently decide preference and equivalence among probability dis¬ 
tributions 

• Algorithms to numerically compute security strategies that account for the limited 
arithmetic that we can do in lack of an explicit model of the hyperreal structure 
that represents our distributions. 

This report is meant to provide the theoretical fundament to build the practical anal¬ 
ysis methods that are described in follow-up work. In that sequel to this research, issues 
of modeling extreme events and damage distributions for a game-theoretic risk control 
will be discussed. 
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